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1. Introduction. We first would like to congratulate the authors for their 
interesting paper on the development of the innovative equi-energy (EE) 
sampler. The EE sampler provides a solution, which may be better than 
existing methods, to a challenging MCMC sampling problem, that is, sam- 
pling from a multimodal target distribution 7r(x). The EE sampler can be 
understood as follows. In the equi-energy jump step, (i) points may move 
within the same mode; or (ii) points may move between two modes; but 
(iii) points cannot move from one energy ring to another energy ring. In the 
Metropolis-Hastings (MH) step, points move locally. Although in the MH 
step, points may not be able to move freely from one mode to another mode, 
the MH step does help a point to move from one energy ring to another en- 
ergy ring locally. To maintain certain balance between these two types of 
operations, an EE jump probability peo must be specified. Thus, the MH 
move and the equi-energy jump play distinct roles in the EE sampler. This 
unique feature makes the EE sampler quite attractive in sampling from a 
multimodal target distribution. 

2. Tuning and "black-box." The performance of the EE sampler depends 
on the number of energy and temperature levels, K, energy levels Hq < 
Hi < ■ ■ • < Hk < Hk+i = oo, temperature ladders 1 = Tq < Ti < • • • < T^, 
the MH proposal distribution, the proposal distribution used in the equi- 
energy jump step and the equi-energy jump probability pee- Based on our 
experience in testing the EE sampler, we felt that the choice of the Hk, the 
MH proposal and p^c are most crucial for obtaining an efficient EE sampler. 
In addition, the choice of these parameters is problem-dependent. To achieve 
fast convergence and good mixing, the EE sampler requires extensive tuning 
of Hk, MH proposal and peo in particular. A general sampler is designed 
to be "black box" in the sense that the user need not tune the sampler to the 
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problem. Some attempts have been made for developing such "black-box" 
samplers in the literature. Neal [4] developed variations on slice sampling 
that can be used to sample from any continuous distributions and that 
require little or no tuning. Chen and Schmeiser [2] proposed the random- 
direction interior-point (RDIP) sampler. RDIP samples from the uniform 
distribution defined over the region U = :0 < y < vr(x)} below the 

curve of the surface defined by '/r(x), which is essentially the same idea used 
in slice sampling. 

3. Boundedness. It is not clear why the target distribution 7r(x) must 
be bounded. Is this a necessary condition required in Theorem 2? It appears 
that the condition sup^tt{x) < oo is used only in the construction of energy 
levels -fffc for A; > for convenience. Would it be possible to relax such an 
assumption? Otherwise, the EE sampler cannot be applied to sampling from 
an unbounded Tr{x) such as a gamma distribution with shape parameter less 
than 1. 

If we rewrite 

Dj = {x:h{x) G [Hj,Hj+i)} = {x : tt{x) G (exp(-Fj+i),exp(-F,)]}, 

we can see that Dq corresponds to the highest-density region. Thus, if Hi 
is appropriately specified, and the guideline given in Section 3.3 is applied 
to the choice of the rest of the Hj^s, the boundedness assumption on tt{x) 
may not be necessary. 

4. Efficiency. The proposed EE sampler requires K{B + N) iterations 
before it starts the lowest-order chain {Xn\n > 0}. Note that here B is 
the number of "burn-in" iterations and is the number of iterations used 
in constructing an empirical energy ring Dj. As it is difficult to determine 

how quickly a Markov chain {X^^} converges, a relatively large B may 
be needed. If the chain X^^'^ does not converge, the acceptance probability 
given in Section 3.1 for the equi-energy move at energy levels lower than k 
may be problematic. Therefore, the EE sampler is quite inefficient as a large 
number of "burn-in" iterations will be wasted. This may be particularly a 
problem when K is large. Interestingly, the authors never disclosed what B 
and N were used in their illustrative examples. Thus, the choice of B and 
N should be discussed in Section 3.3. 

5. Applicability in high-dimensional problems. Based on the guideline 
of the practical implementation provided in the paper, the number of energy 
levels K could be roughly proportional to the dimensionality of the target 
distribution. Thus, for a high-dimensional problem, K could be very large. 
As a result, the EE sampler may become more inefficient as more "burn- in" 
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iterations are required and at the same time, it may be difficult to tune the 
parameters involved in the EE sampler. 

For example, consider a skewed link model for binary response data pro- 
posed by Chen, Dey and Shao [1]. Let (yi, 2/2, • • • , Vn)' denote an n x 1 vector 
of n independent dichotomous random variables. Let Xi = {xn, . . . ,Xip )' be 
a p X 1 vector of covariates. Also let {wi,W2, . • . , Wn)' be a vector of indepen- 
dent latent variables. Then, the skewed link model is formulated as follows: 
Ui = if Wi < and 1 if > 0, where Wi = x[f3 + 6zi + £«, Zi ^ G, Ei ^ F, 
Zi and Ei are independent, (3 = . . . ,/3p)' is a p x 1 vector of regression 
coefficients, 6 is the skewness parameter, G is a known cumulative distri- 
bution function (c.d.f.) of a skewed distribution, and F is a known c.d.f. of 
a symmetric distribution. To carry out Bayesian inference for this binary 
regression model with a skewed link, we need to sample from the joint pos- 
terior distribution of {{'Wi,Zi),i = 1, . . . ,n, f3,6) given the observed data D. 
The dimension of the target distribution is 2n + p+l. When the sample size 
n is large, we face a high-dimensional problem. Notice that the dimension 
of the target distribution can be reduced considerably if we integrate out 
(wijZi) from the likelihood function. However, in this case, the resulting pos- 
terior distribution 7r(/3, 6\D) contains many analytically intractable integrals, 
which could make the EE sampler expensive or even infeasible to implement. 
The skewed link model is only a simple illustration of a high-dimensional 
problem. Sampling from the posterior distribution under nonlinear mixed- 
effects models with missing covariates considered in [5] could be even more 
challenging. 

In contrast, the popular Gibbs sampler may be more attractive and per- 
haps more suitable for a high-dimensional problem because the Gibbs sam- 
pler requires only sampling from low-dimensional conditional distributions. 
As MH sampling can be embedded into a Gibbs step, would it be possible 
to develop an EE-within Gibbs sampler? 

6. Statistical estimation. In the paper, the authors proposed a sophis- 
ticated but interesting Monte Carlo method to estimate the expectation 
Et^qIq^X)] under the target distribution ito{x) = 7r(x) using all chains from 
the EE sampler. Due to the nature of the EE sampler, the state space X is 
partitioned according to the energy levels, that is, X = [J^^qDj. Thus, this 
may be an ideal scenario for applying the partition-weighted Monte Carlo 
method proposed by Chen and Shao [3] . Let {X^^^ , z = 1, 2, . . . , n} denote the 
sample under the chain X^^^ {T = 1). Then, the partition-weighted Monte 
Carlo estimator is given by 
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where the indicator function 1{X^'^^ £ Dj} = 1 if X^^^ G Dj and otherwise, 
and Wj is the weight assigned to the jth. partition. The weights Wj may be 
estimated using the combined sample, {X^^\ k = 1,2, . . . , K}, under the vr^ 
for k = l,2,...,K. 

7. Example 1. We consider samphng from a two-dimensional normal 
mixture, 

(7.1) fix) =j2l \^\^^\~'^' exp|-i(x - pOX-H^ - ^^^)] 
1=1 



where 



and 



x = {xi,X2y, /ii = (0,0), /x^ = (5,5) 



with o"! = (T2 = 1.0, Pi = 0.99 and p2 = —0.99. The purpose of this example is 
to examine performance of the EE sampler under a bivariate normal distri- 
bution with a high correlation between Xi and X2 ■ Since the minimum value 



of the energy function h{x) = — log(/(x)) is around log(47ro"iO"2 y 1.0 — pf) 
0.573, we took Hq = 0.5. K was set to 2. The energy ladder was set between 
//inin and i/min + 100 in a geometric progression, and the temperatures were 
between 1 and 60. The equi-energy jump probability was taken to be 0.1. 
The initial states of the chain X^^^ were drawn uniformly from [0, 1]^. The 
MH proposal was taken to be bivariate Gaussian: X^+i ~ N2{Xn\ T^Til2) , 
where the MH proposal step size Tj for the ith-order chain X^*) was taken 
to be 0.5 such that the acceptance ratio was in the range of (0.23,0.29). The 
overall acceptance rate for the MH move in the EE sampler was 0.26. We 
used 2000 iterations to burn in the EE sampler and then generated 20,000 
iterations. Figure 1 shows autocorrelations and the samples generated in 
each chain based on the last 10,000 iterations. We can see, from Figure 1, 
that the EE sampler works remarkably well and the high correlations do not 
impose any difficulty for the EE sampler at all. 

8. Example 2. In this example, we consider another extreme and more 
challenging case, in which we assume a normal mixture distribution with 
different variances. Specifically, in (7.1) we take 

fji o-iiai2Pi 

0'il<Ji2Pi cr?2 
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Fig. 1. Plots of EE samples from a normal mixture distribution with equal variances. 



with (Til = 0'i2 = 0.01, (721 = C22 = 1-0 and pi = P2 = 0. Since the minimum 
value of the energy function h{x) is around —6.679, we took Hq = —7.0. 
We first tried the same setting for the energy and temperature ladders with 

K = 2, Pec = 0.1 and the MH proposal N2{X^'\ r^Tih) . The chain was 
trapped around one mode and did not move from one mode to another at 
all. A similar result was obtained when we set K = 4. So, it did not help to 
simply increase K. One potential reason for this may be the choice of the MH 
proposal N2{Xn\TQl2) at the lowest energy level. If tq is large, a candidate 
point around the mode with a smaller variance is likely to be rejected. On 
the other hand, the chain with a small tq may move more frequently, but 
the resulting samples will be highly correlated. 

Intuitively, an improvement could be made by increasing tuning en- 
ergy and temperature ladders, choosing a better MH proposal and a more 
appropriate pee- Several attempts along these lines were made to improve 
the EE sampler and the results based on one of those trials are given below. 
In this attempt, K was set to 6, and Hi = log(47r) + q = 2.53 + a, where 
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a was set to 0.6. The energy ladder was set between }i\ and i/min +100 
in a geometric progression, the temperatures were between 1 and 70, and 
Pee = 0.5. The MH proposals were specified as N^iX^ ^r^Til'i) for i > and 
Ar2(/i(xi°^),E(xi°^)) at the lowest energy level, where fi{Xn^) was chosen 
to be the mode of the target distribution based upon the location of the cur- 
rent point Xn^ and T,{Xn^) was specified in a similar fashion as fi{Xn^). We 
used 20,000 iterations to burn in the EE sampler and then generated 50,000 
iterations. Figure 2 shows the plots of the samples generated in X^^^ based 
on all 50,000 iterations. The resulting chain had excellent mixing around 
each mode, and the chain also did move from one mode to another mode. 
However, the chain did not move as freely as expected. 

Due to lack of experience in using the EE sampler, we are not sure at this 
moment whether the EE sampler can be further improved for this example. 
If so, we do not know how. We would like the authors to shed light on this. 
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9. Discussion. The EE sampler is a potentially useful and effective tool 
for sampling from a multimodal distribution. However, as shown in Example 
2, the EE sampler did experience some difficulty in sampling from a bivariate 
normal distribution with different variances. For the unequal variance case, 
the guidelines for practical implementation provided in the paper may not be 
sufficient. The statement, "the sampler can jump freely between the states 
with similar energy levels," may not be accurate as well. 

As a uniform proposal was suggested for the equi-energy move, it becomes 
apparent that the points around the modes corresponding to larger variances 
are more likely to be selected than those corresponding to smaller variances. 
Initially, we thought that an improvement might be made by assigning a 
larger probability to the points from the mixand with a smaller variance. 
However, this would not work as the resulting acceptance probability would 
become small. Thus, a more likely selected point may be less likely to be 
accepted. It does appear that a uniform proposal may be a good choice for 
the equi-energy move. 
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